Upstream Steps
This notebook
import numpy as np
import pandas as pd
import seaborn as sns
import igraph as ig
import matplotlib.pyplot as plt
from datetime import datetime
from scipy.sparse import csr_matrix, isspmatrix
import scanpy as sc
import scanpy.external as sce
sc.settings.verbosity = 3
sc.settings.set_figure_params(dpi=80)
#results_file = '/home/..../brainomics/Dati/3_AdataDimRed.h5ad'
results_file = '/group/brainomics/Intermediate/3.1_UMAP_param.h5ad'
print(datetime.now())
2022-11-26 19:01:00.537511
adata = sc.read('/group/brainomics/Intermediate/2_AdataNorm.h5ad')
print('Loaded Normalizes AnnData object: number of cells', adata.n_obs)
print('Loaded Normalizes AnnData object: number of genes', adata.n_vars)
print('Available metadata for each cell: ', adata.obs.columns)
Loaded Normalizes AnnData object: number of cells 27457
Loaded Normalizes AnnData object: number of genes 13945
Available metadata for each cell: Index(['Cluster', 'Subcluster', 'Donor', 'Layer', 'Gestation_week', 'Index',
'Library', 'Number_genes_detected', 'Number_UMI',
'Percentage_mitochondrial', 'S_phase_score', 'G2M_phase_score', 'Phase',
'n_genes_by_counts', 'total_counts', 'total_counts_mito',
'pct_counts_mito', 'total_counts_ribo', 'pct_counts_ribo', 'n_genes',
'n_counts'],
dtype='object')
adata_check = adata.copy()
sc.pp.pca(adata_check, n_comps=50, use_highly_variable=True, svd_solver='arpack')
computing PCA
on highly variable genes
with n_comps=50
finished (0:00:02)
sc.pp.neighbors(adata_check, n_neighbors=30, n_pcs=25)
computing neighbors
using 'X_pca' with n_pcs = 25
2022-11-26 19:01:15.363745: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-11-26 19:01:15.512600: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2022-11-26 19:01:16.464574: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/4.2.1/lib/R/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs 2022-11-26 19:01:16.464813: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/4.2.1/lib/R/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs 2022-11-26 19:01:16.464920: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
finished: added to `.uns['neighbors']`
`.obsp['distances']`, distances for each pair of neighbors
`.obsp['connectivities']`, weighted adjacency matrix (0:00:27)
sc.tl.umap(adata_check)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:20)
sc.pl.umap(adata_check, color=['Donor', 'Layer'], size=10)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
plt.rcParams['figure.dpi'] = 120
sc.pl.umap(adata_check, color=['Donor'], size=10)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
del adata_check
sc.pp.pca(adata, n_comps=50, use_highly_variable=True, svd_solver='arpack')
computing PCA
on highly variable genes
with n_comps=50
finished (0:00:01)
sc.external.pp.harmony_integrate(adata, 'Donor')
2022-11-26 19:02:08,496 - harmonypy - INFO - Iteration 1 of 10 INFO:harmonypy:Iteration 1 of 10 2022-11-26 19:02:17,481 - harmonypy - INFO - Iteration 2 of 10 INFO:harmonypy:Iteration 2 of 10 2022-11-26 19:02:24,885 - harmonypy - INFO - Iteration 3 of 10 INFO:harmonypy:Iteration 3 of 10 2022-11-26 19:02:33,193 - harmonypy - INFO - Iteration 4 of 10 INFO:harmonypy:Iteration 4 of 10 2022-11-26 19:02:43,286 - harmonypy - INFO - Iteration 5 of 10 INFO:harmonypy:Iteration 5 of 10 2022-11-26 19:02:47,592 - harmonypy - INFO - Iteration 6 of 10 INFO:harmonypy:Iteration 6 of 10 2022-11-26 19:02:49,866 - harmonypy - INFO - Converged after 6 iterations INFO:harmonypy:Converged after 6 iterations
New corrected PC are saved in .obs["X_pca_harmony"]
sc.pl.embedding(adata, basis="X_pca_harmony", color=['Donor'])
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
We compute the neighborhood graph of cells using the harmony-corrected PCA representation of the data. This identifies how similar a cell is to another cell, definying cells that are close from those that are not.
This step is propedeutic for UMAP plotting and for clustering.
Key parameters:
The elbow plot is helpful when determining how many PCs we need to capture the majority of the variation in the data. The elbow plot visualizes the standard deviation of each PC. Where the elbow appears is usually the threshold for identifying the majority of the variation. However, this method can be a bit subjective about where the elbow is located.
sc.settings.set_figure_params(dpi=80)
sc.pl.pca_variance_ratio(adata)
sc.pl.pca_variance_ratio(adata, log=True)
From documentation:
The size of local neighborhood (in terms of number of neighboring data points) used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. In general values should be in the range 2 to 100.
- If knn is True (Default), number of nearest neighbors to be searched.
The k-nearest neighbor graph (k-NNG) is a graph in which two vertices p and q are connected by an edge, if the distance between p and q is among the k-th smallest distances from p to other objects from P.
- If knn is False, a Gaussian kernel width is set to the distance of the n_neighbors neighbor.
This transition matrix is computed using a nearest neighbor graph whose edge weights have a Gaussian distribution with respect to Euclidian distance in gene expression space; transition probabilities correspond to edge weights
Overall:
neigh = [5, 20, 50, 80]
#neigh = [5, 10, 15, 20, 25]
#neigh = [3, 5, 7, 9, 11]
dict_neigh = {}
for x in neigh:
dict_key = 'Neighbours_' + str(x)
dict_neigh[dict_key] = []
print('# neighbors:', x)
sc.pp.neighbors(adata, n_neighbors=x, n_pcs=20, use_rep="X_pca_harmony", key_added="harmony")
sc.tl.umap(adata, neighbors_key="harmony", random_state=1)
sc.pl.umap(adata, color=['Donor', 'Cluster'],
palette=sc.pl.palettes.vega_20_scanpy, size=8)
sc.tl.leiden(adata, resolution=0.3, key_added='Leiden_03')
dict_neigh[dict_key].append(max(adata.obs['Leiden_03'].astype('int')) + 1 )
sc.tl.leiden(adata, resolution=0.4, key_added='Leiden_04')
dict_neigh[dict_key].append(max(adata.obs['Leiden_04'].astype('int')) + 1 )
sc.tl.leiden(adata, resolution=0.5, key_added='Leiden_05')
dict_neigh[dict_key].append(max(adata.obs['Leiden_05'].astype('int')) + 1 )
sc.tl.leiden(adata, resolution=0.6, key_added='Leiden_06')
dict_neigh[dict_key].append(max(adata.obs['Leiden_06'].astype('int')) + 1 )
sc.tl.leiden(adata, resolution=0.7, key_added='Leiden_07')
dict_neigh[dict_key].append(max(adata.obs['Leiden_07'].astype('int')) + 1 )
sc.tl.leiden(adata, resolution=0.8, key_added='Leiden_08')
dict_neigh[dict_key].append(max(adata.obs['Leiden_08'].astype('int')) + 1 )
sc.tl.leiden(adata, resolution=0.9, key_added='Leiden_09')
dict_neigh[dict_key].append(max(adata.obs['Leiden_09'].astype('int')) + 1 )
sc.tl.leiden(adata, resolution=1.0, key_added='Leiden_10')
dict_neigh[dict_key].append(max(adata.obs['Leiden_10'].astype('int')) + 1 )
sc.tl.leiden(adata, resolution=1.1, key_added='Leiden_11')
dict_neigh[dict_key].append(max(adata.obs['Leiden_11'].astype('int')) + 1 )
sc.tl.leiden(adata, resolution=1.2, key_added='Leiden_12')
dict_neigh[dict_key].append(max(adata.obs['Leiden_12'].astype('int')) + 1 )
#color=['Leiden_03', 'Leiden_04', 'Leiden_05', 'Leiden_06', 'Leiden_07', 'Leiden_08', 'Leiden_09']
sc.pl.umap(adata, color=['Leiden_04', 'Leiden_08', 'Leiden_12'],
palette=sc.pl.palettes.vega_20_scanpy, size=8, legend_loc='on data')
# neighbors: 5
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:01)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:10)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
running Leiden clustering
finished: found 8 clusters and added
'Leiden_03', the cluster labels (adata.obs, categorical) (0:00:11)
running Leiden clustering
finished: found 10 clusters and added
'Leiden_04', the cluster labels (adata.obs, categorical) (0:00:12)
running Leiden clustering
finished: found 11 clusters and added
'Leiden_05', the cluster labels (adata.obs, categorical) (0:00:10)
running Leiden clustering
finished: found 11 clusters and added
'Leiden_06', the cluster labels (adata.obs, categorical) (0:00:10)
running Leiden clustering
finished: found 12 clusters and added
'Leiden_07', the cluster labels (adata.obs, categorical) (0:00:09)
running Leiden clustering
finished: found 13 clusters and added
'Leiden_08', the cluster labels (adata.obs, categorical) (0:00:14)
running Leiden clustering
finished: found 17 clusters and added
'Leiden_09', the cluster labels (adata.obs, categorical) (0:00:19)
running Leiden clustering
finished: found 17 clusters and added
'Leiden_10', the cluster labels (adata.obs, categorical) (0:00:14)
running Leiden clustering
finished: found 18 clusters and added
'Leiden_11', the cluster labels (adata.obs, categorical) (0:00:16)
running Leiden clustering
finished: found 18 clusters and added
'Leiden_12', the cluster labels (adata.obs, categorical) (0:00:21)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
# neighbors: 20
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:03)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:17)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
running Leiden clustering
finished: found 8 clusters and added
'Leiden_03', the cluster labels (adata.obs, categorical) (0:00:10)
running Leiden clustering
finished: found 10 clusters and added
'Leiden_04', the cluster labels (adata.obs, categorical) (0:00:11)
running Leiden clustering
finished: found 11 clusters and added
'Leiden_05', the cluster labels (adata.obs, categorical) (0:00:09)
running Leiden clustering
finished: found 11 clusters and added
'Leiden_06', the cluster labels (adata.obs, categorical) (0:00:10)
running Leiden clustering
finished: found 12 clusters and added
'Leiden_07', the cluster labels (adata.obs, categorical) (0:00:08)
running Leiden clustering
finished: found 13 clusters and added
'Leiden_08', the cluster labels (adata.obs, categorical) (0:00:14)
running Leiden clustering
finished: found 17 clusters and added
'Leiden_09', the cluster labels (adata.obs, categorical) (0:00:17)
running Leiden clustering
finished: found 17 clusters and added
'Leiden_10', the cluster labels (adata.obs, categorical) (0:00:14)
running Leiden clustering
finished: found 18 clusters and added
'Leiden_11', the cluster labels (adata.obs, categorical) (0:00:15)
running Leiden clustering
finished: found 18 clusters and added
'Leiden_12', the cluster labels (adata.obs, categorical) (0:00:20)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
# neighbors: 50
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:09)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:23)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
running Leiden clustering
finished: found 8 clusters and added
'Leiden_03', the cluster labels (adata.obs, categorical) (0:00:10)
running Leiden clustering
finished: found 10 clusters and added
'Leiden_04', the cluster labels (adata.obs, categorical) (0:00:11)
running Leiden clustering
finished: found 11 clusters and added
'Leiden_05', the cluster labels (adata.obs, categorical) (0:00:09)
running Leiden clustering
finished: found 11 clusters and added
'Leiden_06', the cluster labels (adata.obs, categorical) (0:00:10)
running Leiden clustering
finished: found 12 clusters and added
'Leiden_07', the cluster labels (adata.obs, categorical) (0:00:08)
running Leiden clustering
finished: found 13 clusters and added
'Leiden_08', the cluster labels (adata.obs, categorical) (0:00:14)
running Leiden clustering
finished: found 17 clusters and added
'Leiden_09', the cluster labels (adata.obs, categorical) (0:00:17)
running Leiden clustering
finished: found 17 clusters and added
'Leiden_10', the cluster labels (adata.obs, categorical) (0:00:13)
running Leiden clustering
finished: found 18 clusters and added
'Leiden_11', the cluster labels (adata.obs, categorical) (0:00:15)
running Leiden clustering
finished: found 18 clusters and added
'Leiden_12', the cluster labels (adata.obs, categorical) (0:00:20)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
# neighbors: 80
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:16)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:26)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
running Leiden clustering
finished: found 8 clusters and added
'Leiden_03', the cluster labels (adata.obs, categorical) (0:00:10)
running Leiden clustering
finished: found 10 clusters and added
'Leiden_04', the cluster labels (adata.obs, categorical) (0:00:11)
running Leiden clustering
finished: found 11 clusters and added
'Leiden_05', the cluster labels (adata.obs, categorical) (0:00:09)
running Leiden clustering
finished: found 11 clusters and added
'Leiden_06', the cluster labels (adata.obs, categorical) (0:00:10)
running Leiden clustering
finished: found 12 clusters and added
'Leiden_07', the cluster labels (adata.obs, categorical) (0:00:08)
running Leiden clustering
finished: found 13 clusters and added
'Leiden_08', the cluster labels (adata.obs, categorical) (0:00:14)
running Leiden clustering
finished: found 17 clusters and added
'Leiden_09', the cluster labels (adata.obs, categorical) (0:00:17)
running Leiden clustering
finished: found 17 clusters and added
'Leiden_10', the cluster labels (adata.obs, categorical) (0:00:13)
running Leiden clustering
finished: found 18 clusters and added
'Leiden_11', the cluster labels (adata.obs, categorical) (0:00:16)
running Leiden clustering
finished: found 18 clusters and added
'Leiden_12', the cluster labels (adata.obs, categorical) (0:00:21)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
minkowski: Metric in a normed vector space which can be considered as a generalization of both the Euclidean distance and the Manhattan distance.
cosine: It is defined to equal the cosine of the angle between two vectors, which is also the same as the inner product of the same vectors normalized to both have length 1. It is thus a judgment of orientation and not magnitude:
jaccard: Ratio of Intersection over Union
correlation: Pearson, Kendall, Spearman
Benchmarking Paper: https://academic.oup.com/bib/article/20/6/2316/5077112
Distance-based metrics such as Euclidean distance are sensitive to data scaling, whereas correlation-based metrics such as Pearson’s correlation are invariant to scaling. Such property makes correlation-based metrics robust to data noise and normalisation procedure
dist = ['euclidean', 'l2', 'manhattan', 'l1', 'minkowski', 'cosine', 'jaccard', 'correlation'] #'cityblock'
dict_metric = {}
for x in dist:
dict_key = 'Metric_' + x
dict_metric[dict_key] = []
print('Metric:', x)
sc.pp.neighbors(adata, n_neighbors=80, n_pcs=20, metric=x, use_rep="X_pca_harmony", key_added="harmony")
sc.tl.umap(adata, neighbors_key="harmony", random_state=1)
sc.pl.umap(adata, color=['Donor', 'Cluster'])
Metric: euclidean
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:17)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:27)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
Metric: l2
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:18)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:28)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
Metric: manhattan
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:23)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:28)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
Metric: l1
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:15)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:27)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
Metric: minkowski
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:22)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:27)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
Metric: cosine
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:23)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:27)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
Metric: jaccard
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:18)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:03:20)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
Metric: correlation
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:24)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:27)
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
sc.pp.neighbors(adata, n_neighbors=80, n_pcs=20, use_rep="X_pca_harmony", key_added="harmony")
#sc.pp.neighbors(adata, n_neighbors=80, n_pcs=15, use_rep="X_pca_harmony", key_added="harmony", metric='cosine')
computing neighbors
finished: added to `.uns['harmony']`
`.obsp['harmony_distances']`, distances for each pair of neighbors
`.obsp['harmony_connectivities']`, weighted adjacency matrix (0:00:16)
sc.tl.umap(adata, neighbors_key="harmony", random_state=1)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:28)
sc.pl.umap(adata, color=['Donor', 'Cluster'])
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
sc.tl.diffmap(adata, neighbors_key="harmony")
computing Diffusion Maps using n_comps=15(=n_dcs)
computing transitions
finished (0:00:00)
eigenvalues of transition matrix
[1. 0.99714315 0.99270475 0.99232537 0.9917588 0.9895217
0.9785628 0.9751901 0.97225225 0.9687998 0.9620149 0.9564335
0.944027 0.93468106 0.9310887 ]
finished: added
'X_diffmap', diffmap coordinates (adata.obsm)
'diffmap_evals', eigenvalues of transition matrix (adata.uns) (0:00:02)
sc.pl.diffmap(adata, color=['Donor', 'Cluster'],components=['2,3'])
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
#Takes quite some time
sc.tl.draw_graph(adata, neighbors_key="harmony")
drawing single-cell graph using layout 'fa'
finished: added
'X_draw_graph_fa', graph_drawing coordinates (adata.obsm) (0:04:46)
sc.pl.draw_graph(adata, color=['Donor', 'Cluster'])
/usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /usr/local/lib/python3.8/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
adata.write(results_file)
print(datetime.now())
2022-11-26 19:33:35.922120
nb_fname = '3_1_UMAP_parameters'
nb_fname
'3_1_UMAP_parameters'
%%bash -s "$nb_fname"
jupyter nbconvert "$1".ipynb --to="python"
jupyter nbconvert "$1".ipynb --to="html"
[NbConvertApp] WARNING | pattern '3_1_UMAP_parameters.ipynb' matched no files
This application is used to convert notebook files (*.ipynb)
to various other formats.
WARNING: THE COMMANDLINE INTERFACE MAY CHANGE IN FUTURE RELEASES.
Options
=======
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
<cmd> --help-all
--debug
set log level to logging.DEBUG (maximize logging output)
Equivalent to: [--Application.log_level=10]
--show-config
Show the application's configuration (human-readable format)
Equivalent to: [--Application.show_config=True]
--show-config-json
Show the application's configuration (json format)
Equivalent to: [--Application.show_config_json=True]
--generate-config
generate default config file
Equivalent to: [--JupyterApp.generate_config=True]
-y
Answer yes to any questions instead of prompting.
Equivalent to: [--JupyterApp.answer_yes=True]
--execute
Execute the notebook prior to export.
Equivalent to: [--ExecutePreprocessor.enabled=True]
--allow-errors
Continue notebook execution even if one of the cells throws an error and include the error message in the cell output (the default behaviour is to abort conversion). This flag is only relevant if '--execute' was specified, too.
Equivalent to: [--ExecutePreprocessor.allow_errors=True]
--stdin
read a single notebook file from stdin. Write the resulting notebook with default basename 'notebook.*'
Equivalent to: [--NbConvertApp.from_stdin=True]
--stdout
Write notebook output to stdout instead of files.
Equivalent to: [--NbConvertApp.writer_class=StdoutWriter]
--inplace
Run nbconvert in place, overwriting the existing notebook (only
relevant when converting to notebook format)
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory=]
--clear-output
Clear output of current file and save in place,
overwriting the existing notebook.
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory= --ClearOutputPreprocessor.enabled=True]
--no-prompt
Exclude input and output prompts from converted document.
Equivalent to: [--TemplateExporter.exclude_input_prompt=True --TemplateExporter.exclude_output_prompt=True]
--no-input
Exclude input cells and output prompts from converted document.
This mode is ideal for generating code-free reports.
Equivalent to: [--TemplateExporter.exclude_output_prompt=True --TemplateExporter.exclude_input=True --TemplateExporter.exclude_input_prompt=True]
--allow-chromium-download
Whether to allow downloading chromium if no suitable version is found on the system.
Equivalent to: [--WebPDFExporter.allow_chromium_download=True]
--disable-chromium-sandbox
Disable chromium security sandbox when converting to PDF..
Equivalent to: [--WebPDFExporter.disable_sandbox=True]
--show-input
Shows code input. This flag is only useful for dejavu users.
Equivalent to: [--TemplateExporter.exclude_input=False]
--embed-images
Embed the images as base64 dataurls in the output. This flag is only useful for the HTML/WebPDF/Slides exports.
Equivalent to: [--HTMLExporter.embed_images=True]
--sanitize-html
Whether the HTML in Markdown cells and cell outputs should be sanitized..
Equivalent to: [--HTMLExporter.sanitize_html=True]
--log-level=<Enum>
Set the log level by value or name.
Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
Default: 30
Equivalent to: [--Application.log_level]
--config=<Unicode>
Full path of a config file.
Default: ''
Equivalent to: [--JupyterApp.config_file]
--to=<Unicode>
The export format to be used, either one of the built-in formats
['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'qtpdf', 'qtpng', 'rst', 'script', 'slides', 'webpdf']
or a dotted object name that represents the import path for an
``Exporter`` class
Default: ''
Equivalent to: [--NbConvertApp.export_format]
--template=<Unicode>
Name of the template to use
Default: ''
Equivalent to: [--TemplateExporter.template_name]
--template-file=<Unicode>
Name of the template file to use
Default: None
Equivalent to: [--TemplateExporter.template_file]
--theme=<Unicode>
Template specific theme(e.g. the name of a JupyterLab CSS theme distributed
as prebuilt extension for the lab template)
Default: 'light'
Equivalent to: [--HTMLExporter.theme]
--sanitize_html=<Bool>
Whether the HTML in Markdown cells and cell outputs should be sanitized.This
should be set to True by nbviewer or similar tools.
Default: False
Equivalent to: [--HTMLExporter.sanitize_html]
--writer=<DottedObjectName>
Writer class used to write the
results of the conversion
Default: 'FilesWriter'
Equivalent to: [--NbConvertApp.writer_class]
--post=<DottedOrNone>
PostProcessor class used to write the
results of the conversion
Default: ''
Equivalent to: [--NbConvertApp.postprocessor_class]
--output=<Unicode>
overwrite base name use for output files.
can only be used when converting one notebook at a time.
Default: ''
Equivalent to: [--NbConvertApp.output_base]
--output-dir=<Unicode>
Directory to write output(s) to. Defaults
to output to the directory of each notebook. To recover
previous default behaviour (outputting to the current
working directory) use . as the flag value.
Default: ''
Equivalent to: [--FilesWriter.build_directory]
--reveal-prefix=<Unicode>
The URL prefix for reveal.js (version 3.x).
This defaults to the reveal CDN, but can be any url pointing to a copy
of reveal.js.
For speaker notes to work, this must be a relative path to a local
copy of reveal.js: e.g., "reveal.js".
If a relative path is given, it must be a subdirectory of the
current directory (from which the server is run).
See the usage documentation
(https://nbconvert.readthedocs.io/en/latest/usage.html#reveal-js-html-slideshow)
for more details.
Default: ''
Equivalent to: [--SlidesExporter.reveal_url_prefix]
--nbformat=<Enum>
The nbformat version to write.
Use this to downgrade notebooks.
Choices: any of [1, 2, 3, 4]
Default: 4
Equivalent to: [--NotebookExporter.nbformat_version]
Examples
--------
The simplest way to use nbconvert is
> jupyter nbconvert mynotebook.ipynb --to html
Options include ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'qtpdf', 'qtpng', 'rst', 'script', 'slides', 'webpdf'].
> jupyter nbconvert --to latex mynotebook.ipynb
Both HTML and LaTeX support multiple output templates. LaTeX includes
'base', 'article' and 'report'. HTML includes 'basic', 'lab' and
'classic'. You can specify the flavor of the format used.
> jupyter nbconvert --to html --template lab mynotebook.ipynb
You can also pipe the output to stdout, rather than a file
> jupyter nbconvert mynotebook.ipynb --stdout
PDF is generated via latex
> jupyter nbconvert mynotebook.ipynb --to pdf
You can get (and serve) a Reveal.js-powered slideshow
> jupyter nbconvert myslides.ipynb --to slides --post serve
Multiple notebooks can be given at the command line in a couple of
different ways:
> jupyter nbconvert notebook*.ipynb
> jupyter nbconvert notebook1.ipynb notebook2.ipynb
or you can specify the notebooks list in a config file, containing::
c.NbConvertApp.notebooks = ["my_notebook.ipynb"]
> jupyter nbconvert --config mycfg.py
To see all available configurables, use `--help-all`.
[NbConvertApp] WARNING | pattern '3_1_UMAP_parameters.ipynb' matched no files
This application is used to convert notebook files (*.ipynb)
to various other formats.
WARNING: THE COMMANDLINE INTERFACE MAY CHANGE IN FUTURE RELEASES.
Options
=======
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
<cmd> --help-all
--debug
set log level to logging.DEBUG (maximize logging output)
Equivalent to: [--Application.log_level=10]
--show-config
Show the application's configuration (human-readable format)
Equivalent to: [--Application.show_config=True]
--show-config-json
Show the application's configuration (json format)
Equivalent to: [--Application.show_config_json=True]
--generate-config
generate default config file
Equivalent to: [--JupyterApp.generate_config=True]
-y
Answer yes to any questions instead of prompting.
Equivalent to: [--JupyterApp.answer_yes=True]
--execute
Execute the notebook prior to export.
Equivalent to: [--ExecutePreprocessor.enabled=True]
--allow-errors
Continue notebook execution even if one of the cells throws an error and include the error message in the cell output (the default behaviour is to abort conversion). This flag is only relevant if '--execute' was specified, too.
Equivalent to: [--ExecutePreprocessor.allow_errors=True]
--stdin
read a single notebook file from stdin. Write the resulting notebook with default basename 'notebook.*'
Equivalent to: [--NbConvertApp.from_stdin=True]
--stdout
Write notebook output to stdout instead of files.
Equivalent to: [--NbConvertApp.writer_class=StdoutWriter]
--inplace
Run nbconvert in place, overwriting the existing notebook (only
relevant when converting to notebook format)
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory=]
--clear-output
Clear output of current file and save in place,
overwriting the existing notebook.
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory= --ClearOutputPreprocessor.enabled=True]
--no-prompt
Exclude input and output prompts from converted document.
Equivalent to: [--TemplateExporter.exclude_input_prompt=True --TemplateExporter.exclude_output_prompt=True]
--no-input
Exclude input cells and output prompts from converted document.
This mode is ideal for generating code-free reports.
Equivalent to: [--TemplateExporter.exclude_output_prompt=True --TemplateExporter.exclude_input=True --TemplateExporter.exclude_input_prompt=True]
--allow-chromium-download
Whether to allow downloading chromium if no suitable version is found on the system.
Equivalent to: [--WebPDFExporter.allow_chromium_download=True]
--disable-chromium-sandbox
Disable chromium security sandbox when converting to PDF..
Equivalent to: [--WebPDFExporter.disable_sandbox=True]
--show-input
Shows code input. This flag is only useful for dejavu users.
Equivalent to: [--TemplateExporter.exclude_input=False]
--embed-images
Embed the images as base64 dataurls in the output. This flag is only useful for the HTML/WebPDF/Slides exports.
Equivalent to: [--HTMLExporter.embed_images=True]
--sanitize-html
Whether the HTML in Markdown cells and cell outputs should be sanitized..
Equivalent to: [--HTMLExporter.sanitize_html=True]
--log-level=<Enum>
Set the log level by value or name.
Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
Default: 30
Equivalent to: [--Application.log_level]
--config=<Unicode>
Full path of a config file.
Default: ''
Equivalent to: [--JupyterApp.config_file]
--to=<Unicode>
The export format to be used, either one of the built-in formats
['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'qtpdf', 'qtpng', 'rst', 'script', 'slides', 'webpdf']
or a dotted object name that represents the import path for an
``Exporter`` class
Default: ''
Equivalent to: [--NbConvertApp.export_format]
--template=<Unicode>
Name of the template to use
Default: ''
Equivalent to: [--TemplateExporter.template_name]
--template-file=<Unicode>
Name of the template file to use
Default: None
Equivalent to: [--TemplateExporter.template_file]
--theme=<Unicode>
Template specific theme(e.g. the name of a JupyterLab CSS theme distributed
as prebuilt extension for the lab template)
Default: 'light'
Equivalent to: [--HTMLExporter.theme]
--sanitize_html=<Bool>
Whether the HTML in Markdown cells and cell outputs should be sanitized.This
should be set to True by nbviewer or similar tools.
Default: False
Equivalent to: [--HTMLExporter.sanitize_html]
--writer=<DottedObjectName>
Writer class used to write the
results of the conversion
Default: 'FilesWriter'
Equivalent to: [--NbConvertApp.writer_class]
--post=<DottedOrNone>
PostProcessor class used to write the
results of the conversion
Default: ''
Equivalent to: [--NbConvertApp.postprocessor_class]
--output=<Unicode>
overwrite base name use for output files.
can only be used when converting one notebook at a time.
Default: ''
Equivalent to: [--NbConvertApp.output_base]
--output-dir=<Unicode>
Directory to write output(s) to. Defaults
to output to the directory of each notebook. To recover
previous default behaviour (outputting to the current
working directory) use . as the flag value.
Default: ''
Equivalent to: [--FilesWriter.build_directory]
--reveal-prefix=<Unicode>
The URL prefix for reveal.js (version 3.x).
This defaults to the reveal CDN, but can be any url pointing to a copy
of reveal.js.
For speaker notes to work, this must be a relative path to a local
copy of reveal.js: e.g., "reveal.js".
If a relative path is given, it must be a subdirectory of the
current directory (from which the server is run).
See the usage documentation
(https://nbconvert.readthedocs.io/en/latest/usage.html#reveal-js-html-slideshow)
for more details.
Default: ''
Equivalent to: [--SlidesExporter.reveal_url_prefix]
--nbformat=<Enum>
The nbformat version to write.
Use this to downgrade notebooks.
Choices: any of [1, 2, 3, 4]
Default: 4
Equivalent to: [--NotebookExporter.nbformat_version]
Examples
--------
The simplest way to use nbconvert is
> jupyter nbconvert mynotebook.ipynb --to html
Options include ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'qtpdf', 'qtpng', 'rst', 'script', 'slides', 'webpdf'].
> jupyter nbconvert --to latex mynotebook.ipynb
Both HTML and LaTeX support multiple output templates. LaTeX includes
'base', 'article' and 'report'. HTML includes 'basic', 'lab' and
'classic'. You can specify the flavor of the format used.
> jupyter nbconvert --to html --template lab mynotebook.ipynb
You can also pipe the output to stdout, rather than a file
> jupyter nbconvert mynotebook.ipynb --stdout
PDF is generated via latex
> jupyter nbconvert mynotebook.ipynb --to pdf
You can get (and serve) a Reveal.js-powered slideshow
> jupyter nbconvert myslides.ipynb --to slides --post serve
Multiple notebooks can be given at the command line in a couple of
different ways:
> jupyter nbconvert notebook*.ipynb
> jupyter nbconvert notebook1.ipynb notebook2.ipynb
or you can specify the notebooks list in a config file, containing::
c.NbConvertApp.notebooks = ["my_notebook.ipynb"]
> jupyter nbconvert --config mycfg.py
To see all available configurables, use `--help-all`.
--------------------------------------------------------------------------- CalledProcessError Traceback (most recent call last) Cell In [38], line 1 ----> 1 get_ipython().run_cell_magic('bash', '-s "$nb_fname"', 'jupyter nbconvert "$1".ipynb --to="python"\njupyter nbconvert "$1".ipynb --to="html"\n') File /usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py:2362, in InteractiveShell.run_cell_magic(self, magic_name, line, cell) 2360 with self.builtin_trap: 2361 args = (magic_arg_s, cell) -> 2362 result = fn(*args, **kwargs) 2363 return result File /usr/local/lib/python3.8/dist-packages/IPython/core/magics/script.py:153, in ScriptMagics._make_script_magic.<locals>.named_script_magic(line, cell) 151 else: 152 line = script --> 153 return self.shebang(line, cell) File /usr/local/lib/python3.8/dist-packages/IPython/core/magics/script.py:305, in ScriptMagics.shebang(self, line, cell) 300 if args.raise_error and p.returncode != 0: 301 # If we get here and p.returncode is still None, we must have 302 # killed it but not yet seen its return code. We don't wait for it, 303 # in case it's stuck in uninterruptible sleep. -9 = SIGKILL 304 rc = p.returncode or -9 --> 305 raise CalledProcessError(rc, cell) CalledProcessError: Command 'b'jupyter nbconvert "$1".ipynb --to="python"\njupyter nbconvert "$1".ipynb --to="html"\n'' returned non-zero exit status 255.